Cell types are well sorted in the UMAP, the biggest spot are the Plasma cells, in concordance with the amount of samples containing only plasma cells.
Unfortunately the cells with high content of mtRNA still cluster together. Let's check the some quality metrics plot and the markers expressed from those cells that show low counts and hih mt RNA:
Let's sort the cells with mtRNA higher than 10% and check the distribution, then the highly expressed genes
Cells with > than 15 % mtRNA: 10424
Some cells still cluster by the amount of mtRNA, let's reduce a little bit the threshold
Cells with > than 10 % mtRNA: 28701
Removing the cells with more than 10% of mtRNA the clustering isn't visible anymore. Since the cells removed are numerous (28701), let's see if is possible to subset a group of cells with high counts that doesn't show mainly mt genes as highly expressed.
Gene highly expressed in all the cells with > 10% mtRNA
Highly expressed genes are mainly mtRNAs, ribosomal proteins and MALAT1. I decided to remove these cells.
AnnData object with n_obs × n_vars = 319816 × 48361
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent_mt', 'batch', 'label', 'selected_cells'
var: 'gene_ids'
uns: 'neighbors', 'umap', 'label_colors', 'selected_cells_colors'
obsm: 'X_umap', 'X_pca'
obsp: 'distances', 'connectivities'